Multi-view data containing complementary and consensus information can facilitate representation learning by exploiting the intact integration of multi-view features. Because most objects in real world often have underlying connections, organizing multi-view data as heterogeneous graphs is beneficial to extracting latent information among different objects. Due to the powerful capability to gather information of neighborhood nodes, in this paper, we apply Graph Convolutional Network (GCN) to cope with heterogeneous-graph data originating from multi-view data, which is still under-explored in the field of GCN. In order to improve the quality of network topology and alleviate the interference of noises yielded by graph fusion, some methods undertake sorting operations before the graph convolution procedure. These GCN-based methods generally sort and select the most confident neighborhood nodes for each vertex, such as picking the top-k nodes according to pre-defined confidence values. Nonetheless, this is problematic due to the non-differentiable sorting operators and inflexible graph embedding learning, which may result in blocked gradient computations and undesired performance. To cope with these issues, we propose a joint framework dubbed Multi-view Graph Convolutional Network with Differentiable Node Selection (MGCN-DNS), which is constituted of an adaptive graph fusion layer, a graph learning module and a differentiable node selection schema. MGCN-DNS accepts multi-channel graph-structural data as inputs and aims to learn more robust graph fusion through a differentiable neural network. The effectiveness of the proposed method is verified by rigorous comparisons with considerable state-of-the-art approaches in terms of multi-view semi-supervised classification tasks.
translated by 谷歌翻译
车辆到设施通信技术的最新进展使自动驾驶汽车能够共享感官信息以获得更好的感知性能。随着自动驾驶汽车和智能基础设施的快速增长,V2X感知系统将很快在大规模部署,这引发了一个关键的问题:我们如何在现实世界部署之前在挑战性的交通情况下评估和改善其性能?收集多样化的大型现实世界测试场景似乎是最简单的解决方案,但昂贵且耗时,而且收藏量只能涵盖有限的情况。为此,我们提出了第一个开放的对抗场景生成器V2XP-ASG,该发电机可以为现代基于激光雷达的多代理感知系统产生现实,具有挑战性的场景。 V2XP-ASG学会了构建对抗性协作图,并以对抗性和合理的方式同时扰动多个代理的姿势。该实验表明,V2XP-ASG可以有效地确定各种V2X感知系统的具有挑战性的场景。同时,通过对有限数量的挑战场景进行培训,V2X感知系统的准确性可以进一步提高12.3%,而正常场景的准确性可以进一步提高4%。
translated by 谷歌翻译
评估对象图像的模糊对于提高对象识别和检索的性能至关重要。主要挑战在于缺乏具有可靠标签和有效学习策略的丰富图像。当前的数据集标记为有限且混乱的质量水平。为了克服这一限制,我们建议将成对图像之间的等级关系标记,而不是它们的质量水平,因为人类更容易标记,并建立具有可靠标签的大规模逼真的面部图像模糊评估数据集。基于此数据集,我们提出了一种仅以成对等级标签作为监督的方法来获得模糊分数。此外,为了进一步提高绩效,我们提出了一种基于四倍体排名一致性的自制方法,以更有效地利用未标记的数据。受监督和自我监督的方法构成了最终的半监督学习框架,可以端对端训练。实验结果证明了我们方法的有效性。
translated by 谷歌翻译
现代自治系统被饰有许多具有挑战性的场景,代理人将面临意外的事件和复杂的任务。使用控制命令和未知输入的干扰噪声的存在可能会产生负面影响机器人性能。以前的联合投入和国家估算研究分别研究了没有任何先前信息的连续和离散案件。本文将连续空间和离散空间估计与基于期望 - 最大(EM)算法的统一理论结合在一起。通过将事件的先验知识作为约束,制定不等式优化问题以确定增益矩阵或动态权重,以实现具有较低方差和更准确的决策的最佳输入估计。最后,来自实验的统计结果表明,我们的算法在连续空间中的RKF具有比KF和47 \%改善的差异为81 \%;通过实验还通过实验分析了在离散空间中的输入估计器的正确决策概率的显着提高。
translated by 谷歌翻译
speed among all existing VIS models, and achieves the best result among methods using single model on the YouTube-VIS dataset. For the first time, we demonstrate a much simpler and faster video instance segmentation framework built upon Transformers, achieving competitive accuracy. We hope that VisTR can motivate future research for more video understanding tasks.
translated by 谷歌翻译
Deep learning models can achieve high accuracy when trained on large amounts of labeled data. However, real-world scenarios often involve several challenges: Training data may become available in installments, may originate from multiple different domains, and may not contain labels for training. Certain settings, for instance medical applications, often involve further restrictions that prohibit retention of previously seen data due to privacy regulations. In this work, to address such challenges, we study unsupervised segmentation in continual learning scenarios that involve domain shift. To that end, we introduce GarDA (Generative Appearance Replay for continual Domain Adaptation), a generative-replay based approach that can adapt a segmentation model sequentially to new domains with unlabeled data. In contrast to single-step unsupervised domain adaptation (UDA), continual adaptation to a sequence of domains enables leveraging and consolidation of information from multiple domains. Unlike previous approaches in incremental UDA, our method does not require access to previously seen data, making it applicable in many practical scenarios. We evaluate GarDA on two datasets with different organs and modalities, where it substantially outperforms existing techniques.
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译
As one of the prevalent methods to achieve automation systems, Imitation Learning (IL) presents a promising performance in a wide range of domains. However, despite the considerable improvement in policy performance, the corresponding research on the explainability of IL models is still limited. Inspired by the recent approaches in explainable artificial intelligence methods, we proposed a model-agnostic explaining framework for IL models called R2RISE. R2RISE aims to explain the overall policy performance with respect to the frames in demonstrations. It iteratively retrains the black-box IL model from the randomized masked demonstrations and uses the conventional evaluation outcome environment returns as the coefficient to build an importance map. We also conducted experiments to investigate three major questions concerning frames' importance equality, the effectiveness of the importance map, and connections between importance maps from different IL models. The result shows that R2RISE successfully distinguishes important frames from the demonstrations.
translated by 谷歌翻译
Compressed videos often exhibit visually annoying artifacts, known as Perceivable Encoding Artifacts (PEAs), which dramatically degrade video visual quality. Subjective and objective measures capable of identifying and quantifying various types of PEAs are critical in improving visual quality. In this paper, we investigate the influence of four spatial PEAs (i.e. blurring, blocking, bleeding, and ringing) and two temporal PEAs (i.e. flickering and floating) on video quality. For spatial artifacts, we propose a visual saliency model with a low computational cost and higher consistency with human visual perception. In terms of temporal artifacts, self-attention based TimeSFormer is improved to detect temporal artifacts. Based on the six types of PEAs, a quality metric called Saliency-Aware Spatio-Temporal Artifacts Measurement (SSTAM) is proposed. Experimental results demonstrate that the proposed method outperforms state-of-the-art metrics. We believe that SSTAM will be beneficial for optimizing video coding techniques.
translated by 谷歌翻译
We propose a distributionally robust return-risk model for Markov decision processes (MDPs) under risk and reward ambiguity. The proposed model optimizes the weighted average of mean and percentile performances, and it covers the distributionally robust MDPs and the distributionally robust chance-constrained MDPs (both under reward ambiguity) as special cases. By considering that the unknown reward distribution lies in a Wasserstein ambiguity set, we derive the tractable reformulation for our model. In particular, we show that that the return-risk model can also account for risk from uncertain transition kernel when one only seeks deterministic policies, and that a distributionally robust MDP under the percentile criterion can be reformulated as its nominal counterpart at an adjusted risk level. A scalable first-order algorithm is designed to solve large-scale problems, and we demonstrate the advantages of our proposed model and algorithm through numerical experiments.
translated by 谷歌翻译